Visualizing CNNs using timm-vis
Showcase of the timm-vis library
There are numerous methods to understand a Convolutional Neural Network by visualizing it. But, most repositories/ libraries that implement these techniques only work for specific models such as a VGG/ Alexnet pretrained on ImageNet. So, I created timm-vis, a library that provides a variety of visualization methods that work on any model trained on any dataset. The only requirement is that the model is an image classifier built with PyTorch. The 8 visualization techniques are described in detail below. If you would like to try out these visualization methods, you can start with the details.ipynb in the repository.
from timm_vis.methods import *
import timm
Throughout the notebook, an EfficientNet B0 pretrained on ImageNet and an image of a dog ("chow chow" - class 260) will be used.
model = timm.create_model(model_name = 'efficientnet_b0', pretrained = True)
img = Image.open('chow.jpg').resize((512, 512))
img
1. Visualize filters
The visualize_filters function plots the filters of a convolutional layer by interpreting each channel as a grayscale image.
Parameters:
model: PyTorch image classifierfilter_name: name of the layer whose filters are visualized, defaults to first layermax_filters: maximum number of filters to be displayed, defaults to 64size: size to which filters are upsized to, defaults to 128figsize: size of the pyplot figure, defaults to (16, 16)save_path: path where generated plot is saved, defaults to None
Below, 25 filters of the second convolutional layer (named 'blocks.0.0.conv_dw.weight') are plotted. The name of a layer can be found by iterating over model.named_parameters(). If the number of filters exceeds max_filters, max_filters random filters are plotted.
visualize_filters(model, 'blocks.0.0.conv_dw.weight', max_filters = 25)
2. Visualize activations
The visualize_activations function plots the activations of a specific layer given a specific image.
Parameters:
model: PyTorch image classifiermodule: layer whose activations are recordedimg_path: path to imagemax_acts: maximum number of activations to be displayed, defaults to 64figsize: size of the pyplot figure, defaults to (16, 16)save_path: path where generated plot is saved, defaults to None
The image at img_path is converted to a tensor and fed to the model. The outputs of module are stored and displayed. Each channel in the intermediate output tensor is interpreted as a grayscale image. Below are the activations of the model's first convolutional layer for the image of the dog.
visualize_activations(model, model.conv_stem, 'chow.jpg')
3. Maximally Activated Patches
The maximally_activated_patches function plots the patches of the image that produce the maximal activations at the last layer while performing a forward pass with that image.
Parameters:
model: PyTorch image classifierimg_path: path to imagepatch_size: size of patch, defaults to 448stride: stride of the sliding patches, defaults to 100num_patches: number of patchesfigsize: size of the pyplot figure, defaults to (16, 16)device: device to use while computing patches, defaults to cudasave_path: path where generated plot is saved, defaults to None
To find the maximally activated patches, parts of the image (patches) are occluded. The occlusions that produce the largest change in the predicted scores for the top class are ranked higher than those that produce minimal changes. The top num_patches patches of the image are plotted. Below are the top 5 patches of the dog image.
maximally_activated_patches(model, 'chow.jpg')
4. Saliency map
The saliency_map function plots the gradient of the predicted score with respect to each pixel in the input image.
Parameters:
model: PyTorch image classifierimg_path: path to imagefigsize: size of the pyplot figure, defaults to (16, 16)device: device to use while computing patches, defaults to cudasave_path: path where generated plot is saved, defaults to None
The gradient of the un-normalized class score is calculated with respect to the pixels in the image. The absolute value and maximum is taken over all three channels. Lighter parts of the map correspond with gradients of higher magnitude. Below is the saliency map of the dog image.
saliency_map(model, 'chow.jpg')
5. Generate synthetic image
The generate_image function generates a synthetic image that maximizes the score of a specific class.
Parameters:
model: PyTorch image classifiertarget_class: the class whose score is maximizedepochs: number of epochs to execute gradient ascent formin_prob: minimum probability of the target class, gradient ascent is interrupted if the confidence score for target class is > min_problr: learning rateweight_decay: weight decay used for L2 regularizationstep_size: step size for learning rate scheduler, defaults to 100gamma: gamma used for learning rate scheduler, defaults to 0.6noise_size: size of initial noise, defaults to 224p_freq: printing frequency, defaults to 50init: function used when initializing noise, defaults to torch.randnmodel: PyTorch image classifierdevice: device to use while computing patches, defaults to cudafigsize: size of the pyplot figure, defaults to (6, 6)save_path: path where generated plot is saved, defaults to None
The input to the model is initialized using the init function. A forward pass is performed in order to compute gradients of the target class with respect to the input. The input is changed in order to maximize the score of the target class. This process is repeated for epochs iterations or until the model predicts the target class with minimum probability min_prob. The function call below generates a synthetic image for which the model predicts class 130 (flamingo) with a confidence of ~0.91.
synthetic_image = generate_image(model = model, target_class = 130, epochs = 500, min_prob = 0.9, lr = 10, weight_decay = 5e-2,
step_size = 100, gamma = 0.9)
6. Fool model
The fool_model function modifies an input image such that the model's score for a target class is maximized.
Parameters:
model: PyTorch image classifierimg_path: path to imagetarget_class: the class whose score is maximizedepochs: number of epochs to execute gradient ascent formin_prob: minimum probability of the target class, gradient ascent is interrupted if the confidence score for target class is > min_problr: learning ratestep_size: step size for learning rate scheduler, defaults to 100gamma: gamma used for learning rate scheduler, defaults to 0.6p_freq: printing frequency, defaults to 50init: function used when initializing noise, defaults to torch.randnmodel: PyTorch image classifierdevice: device to use while computing patches, defaults to cudafigsize: size of the pyplot figure, defaults to (6, 6)save_path: path where generated plot is saved, defaults to None
The input image is modified through the same procedure used in the generate_image function. The only difference between the two methods is that this method inputs the image at img_path to the model instead of a random tensor. The function call below modifies the image of a chow chow only slightly. But, the model predicts that the modified image is of class 724 (pirate ship) with a high confidence.
adv = fool_model(model = model, img_path = 'chow.jpg', target_class = 724, epochs = 500,
min_prob = 0.9, lr = 5e-1, step_size = 100, gamma = 0.8)
Confirming that the model does predict the above image as class 724 (pirate ship):
model(adv).argmax()
7. Feature inversion
The feature_inversion function reconstructs an input image using intermediate feature representation of multiple modules.
Parameters:
model: PyTorch image classifiermodules: list of modulesimg_path: path to imageepochs: number of epochs to execute gradient ascent forlr: learning ratestep_size: step size for learning rate scheduler, defaults to 100gamma: gamma used for learning rate scheduler, defaults to 0.6mu: regularization factor for total variation regularizerdevice: device to use while computing patches, defaults to cudafigsize: size of the pyplot figure, defaults to (6, 6)save_path: path where generated plot is saved, defaults to None
The feature vector of the input image from a module is recorded. Another image is generated that minimizes the sum of the distance between the feature vector of the recreated image and the feature vector of the original input image and the total variation regularizer (required for the image to look "natural"). Below are the outputs of the function when an image is reconstructured using the outputs of the first, second and last convolutional layer of the model. As seen below, earlier layers of the model tend to recreate images that closely resemble the input image. This shows that as the image is passes through the model, information is lost.
modules = [model.conv_stem, model.blocks[0][0].conv_dw, model.blocks[-1][0].conv_pwl]
feature_inversion(model, modules, 'chow.jpg', 100, 1e-3)
8. Deep Dream
The deep_dream function modifies an input image in order to maximize the activations of an intermediate layer.
Parameters:
model: PyTorch image classifiermodule: module whose outputs are maximizedimg_path: path to imageepochs: number of epochs to execute gradient ascent forlr: learning ratestep_size: step size for learning rate scheduler, defaults to 100gamma: gamma used for learning rate scheduler, defaults to 0.6device: device to use while computing patches, defaults to cudafigsize: size of the pyplot figure, defaults to (12, 12)save_path: path where generated plot is saved, defaults to None
A given input image is modified in order to maximize the outputs of module. This is a very simplistic implementation of Deep Dream. For the same input image, the outputs of this function may vary depending on the model weights.
dream = deep_dream(model = model, module = model.blocks[-2][0].conv_dw
, img_path = 'chow.jpg', epochs = 100, lr = 2)
The above functions will work for any PyTorch image classifier. However, you may have to use different hyperparameters for various models and functions. If you notice any bugs or missing citations or have any feedback, code optimizations or feature requests, please inform me through GitHub.